This report aims to provide a brief summary of methods and data diagnostic plots generated for your experiment.
The fastq files are available for download upon request.
Users receiving files from RSCshare are advised to delete all files once they have securely copied them to their own drives.
a copy of your data will be securely archived on our end
The multiqc html (separate file), summarises the alignment statistics along with the duplication rates, flagstats and other useful quality control metrics pertaining to all samples. Alignment and qc stats were generated via bwa-mem and samtools;
The raw fastq reads were first processed with trim-galore package to:
trim_galore --nextseq 20 --gzip --length 50 -O 1 -a AGATCGGAAGAGC j 1 -e 0.1 --paired --fastqc
The trimmed reads were then aligned to r params$genome reference genome with r params$annot annotations;
bwa mem \
-R "@RG\tID:${name}\tSM:${name}\tPL:ILLUMINA\tLB:${name}\tPU:1" \
${genome} $i | samtools view -@ 24 -b -h -F 0x0100 -O BAM -o ${name}.bam
These primary bam files were then marked for duplicates using picard.
Duplicated and MT mapping reads were removed. The filtered (-Duplicated -MT) bams were then used for all downstream analysis.
MACS2 was used to call peaks on filtered bam files using the following parameters:
macs2 callpeak -t ${bam} \
-f BAMPE \
-n ${name} \
-g hs \
-q 0.05
--nomodel --shift 37 --ext 73 \
--keep-dup all
Post-normalization, the medians should be consistent across samples and more similar between biological replicates.
An euclidean distance is computed between samples, and the dendrogram is built upon the Ward criterion. We expect this dendrogram to group replicates and separate biological conditions.
Another way of visualizing the experiment variability is to look at the first principal components of the PCA. On this figure, the first principal component (PC1) is expected to separate samples from the different biological conditions, meaning that the biological variability is the main source of variance in the data.
The above figure represents the MA-plot of the data for the comparisons done, where differentially expressed features are highlighted in red. A MA-plot represents the log ratio of differential expression as a function of the mean intensity for each feature. Triangles correspond to features having a too low/high log2(FC) to be displayed on the plot.
citation("DESeq2")
##
## Love, M.I., Huber, W., Anders, S. Moderated estimation of fold change
## and dispersion for RNA-seq data with DESeq2 Genome Biology 15(12):550
## (2014)
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2},
## author = {Michael I. Love and Wolfgang Huber and Simon Anders},
## year = {2014},
## journal = {Genome Biology},
## doi = {10.1186/s13059-014-0550-8},
## volume = {15},
## issue = {12},
## pages = {550},
## }
citation("SARTools")
##
## Hugo Varet, Loraine Brillet-Guéguen, Jean-Yves Coppée and
## Marie-Agnès Dillies (2016): SARTools: A DESeq2- and EdgeR-Based R
## Pipeline for Comprehensive Differential Analysis of RNA-Seq Data.
## PLoS One, 2016, doi: http://dx.doi.org/10.1371/journal.pone.0157022
##
## A BibTeX entry for LaTeX users is
##
## @Article{,
## title = {SARTools: A DESeq2- and EdgeR-Based R Pipeline for Comprehensive Differential Analysis of RNA-Seq Data},
## author = {Hugo Varet and Loraine Brillet-Guéguen and Jean-Yves Coppée and Marie-Agnès Dillies},
## year = {2016},
## journal = {PLoS One},
## doi = {10.1371/journal.pone.0157022},
## url = {http://dx.doi.org/10.1371/journal.pone.0157022},
## }